Journal: Bioinformatics
Article Title: COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data
doi: 10.1093/bioinformatics/btad204
Figure Lengend Snippet: (A ) Cross-validation performance of COmic models compared to previously published methods. The boxplots show the 10 mean auROC validation scores of a 10-times repeated 10-fold cross-validation over each of the 6 breast cancer cohorts. Performance of COmic models is shown in orange (last two boxes on the right side). The center line of each box indicates the median. The height of the boxes represents the interquartile range (IQR) with the upper and lower whiskers set to 1.5 times the IQR. Outliers are depicted by black diamonds. Notches represent the confidence interval (CI) around the median and were calculated using bootstrapping with 10 000 iterations. (B) Visualizing the global interpretation capabilities of a pooling-based COmic model. Each box represents one of the 50 pathways and was created using the pathway weights of the models trained on the six publicly available breast cancer cohorts: GSE11121, GSE1456, GSE2034, GSE2990, GSE4922, and GSE7390. The boxplots are defined as in (A) but without notches (CIs not shown). (C) Visualizing the local interpretation capabilities of an attention-based COmic model. Each heat-map shows the attention weights for each of the 50 pathways for three different patients. The model was trained on the GSE11121 cohort. Patient 1 was correctly classified to have a metastasis-free survival (DMFS) above 5 years. Patient 2 was correctly classified to have a DMFS below 5 years. Patient 3 was wrongly classified to have a DMFS above 5 years, while the DMFS of Patient 3 was actually below 5 years. More examples can be found in the supplement. (D) Mean training time of pooling-based and attention-based COmic models for differently sized datasets. The number of samples is 100, 1000, 10 000, and 100 000, respectively. Training was repeated five times per dataset and the stars represent the mean training time. The blue and green lines show the results for a fixed batch size of 32 samples per batch. The red and yellow lines show the results for an adaptive batch size of 1% of the dataset size (i.e. the batch size was 1 for the dataset with 100 samples and 1000 for the dataset with 100 000 samples). Each model was trained for 200 epochs.
Article Snippet: We trained COmic models on six different public breast cancer Affymetrix HGU133A microarray datasets (GSE11121, GSE1456, GSE2034, GSE2990, GSE4922, and GSE7390) that were previously used to benchmark knowledge-based classification methods that use interaction network priors.
Techniques: